Building a Trajectory Regulatory Network with scRNA-seq Data

Author
Affiliation

Jack Leary

University of Florida

Published

January 21, 2024

1 Introduction

In this analysis we’ll be looking at cells undergoing neurogenesis in the developing human frontal cortex. The data are taken from Nowakowski et al (2017). Our goal is to build a simple method for fitting single cell gene regulatory networks (GRNs) and compare the results across pseudotime lineages, keeping in mind the trajectory structure of the data. If you want to skip straight to the analysis, head to Section 6.

2 Libraries

We start by loading in a few necessary packages & resolving a few function conflicts.

Code
library(dplyr)                 # data manipulation
library(Seurat)                # scRNA-seq tools
library(scLANE)                # trajectory DE
library(igraph)                # graph utilities
library(ggplot2)               # pretty plots
library(biomaRt)               # gene annotation
library(foreach)               # parallel loops
library(ggnetwork)             # plotting graphs
library(slingshot)             # pseudotime estimation
library(patchwork)             # plot alignment
library(SingleCellExperiment)  # scRNA-seq data structures
rename <- dplyr::rename
select <- dplyr::select

3 Helper functions

We define a helper function to make our legends look prettier.

Code
guide_umap <- function(key.size = 4) {
  ggplot2::guides(color = ggplot2::guide_legend(override.aes = list(size = key.size, 
                                                                    alpha = 1, 
                                                                    stroke = 0.25)))
}

Next, we assign a few consistent color palettes that we’ll use throughout.

Code
palette_heatmap <- paletteer::paletteer_d("MetBrewer::Hiroshige", direction = -1)
palette_celltype <- paletteer::paletteer_d("ggsci::category10_d3")
palette_cluster <- paletteer::paletteer_d("ggsci::default_locuszoom")
palette_timepoint <- paletteer::paletteer_d("ggsci::default_igv")

4 Data

Unfortunately the data that are made available as part of the scRNAseq package come in a transcripts-per-million (TPM) normalized matrix - not raw counts. A rough conversion back to the raw integer counts is to multiply each cell’s normalized expression vector by the TPM size factor. This gets us most of the way there but the results are still estimates, so we round them to the nearest integer.

Note

This conversion ignores gene length, which is why the results are estimates & require rounding. For a more detailed discussion on how to un-normalize TPM counts see this StackOverflow post. While the original authors used a plate-based protocol (Fluidigm C1) they do not mention whether they used unique molecular identifiers (UMIs); if UMIs were used then gene length normalization isn’t necessary. Since simply reversing the TPM transformation doesn’t result in integer values I would guess that UMIs were not used, though their Supplementary Methods don’t mention gene length normalization.

Code
sce_cortex <- scRNAseq::NowakowskiCortexData()
counts_matrix <- Matrix::t(sce_cortex@assays@data$tpm)
scale_factors <- Matrix::rowSums(counts_matrix) / 1e6
counts_matrix <- round(Matrix::t(counts_matrix) * scale_factors)
seu_cortex <- CreateSeuratObject(counts_matrix, 
                                 assay = "RNA", 
                                 meta.data = as.data.frame(colData(sce_cortex)), 
                                 project = "cortex", 
                                 min.cells = 5,
                                 min.features = 0)
excitatory_neuron_celltypes <- c("nEN-late", "nEN-early1", "nEN-early2",
                                 "EN-V1-1", "EN-V1-2", "EN-V1-3", 
                                 "EN-PFC1", "EN-PFC2", "EN-PFC3")
seu_cortex <- seu_cortex[, seu_cortex$WGCNAcluster %in% excitatory_neuron_celltypes]
seu_cortex@meta.data <- mutate(seu_cortex@meta.data, 
                               celltype_original = case_when(WGCNAcluster == "nEN-early1" ~ "Early Exc. Neuron 1", 
                                                             WGCNAcluster == "nEN-early2" ~ "Early Exc. Neuron 2", 
                                                             WGCNAcluster == "nEN-late" ~ "Late Exc. Neuron", 
                                                             WGCNAcluster == "EN-V1-1" ~ "Exc. Neuron 1", 
                                                             WGCNAcluster == "EN-V1-2" ~ "Exc. Neuron 2", 
                                                             WGCNAcluster == "EN-V1-3" ~ "Exc. Neuron 3", 
                                                             WGCNAcluster == "EN-PFC1" ~ "PFC Exc. Neuron 1", 
                                                             WGCNAcluster == "EN-PFC2" ~ "PFC Exc. Neuron 2", 
                                                             WGCNAcluster == "EN-PFC3" ~ "PFC Exc. Neuron 3", 
                                                             TRUE ~ NA_character_))

5 Preprocessing

We run the cells through a basic preprocessing pipeline: QC, normalization, highly variable gene (HVG) identification, PCA, UMAP, and graph-based Leiden clustering.

Code
seu_cortex <- seu_cortex %>% 
              PercentageFeatureSet(pattern = "^MT-", col.name = "percent_mito") %>% 
              PercentageFeatureSet(pattern = "^RP[SL]", col.name = "percent_ribo") %>% 
              NormalizeData(verbose = FALSE) %>% 
              FindVariableFeatures(nfeatures = 3000, verbose = FALSE) %>% 
              ScaleData(verbose = FALSE) %>% 
              RunPCA(npcs = 30, 
                     approx = TRUE, 
                     seed.use = 312, 
                     verbose = FALSE) %>% 
              CellCycleScoring(s.features = cc.genes.updated.2019$s.genes, 
                               g2m.features = cc.genes.updated.2019$g2m.genes, 
                               set.ident = FALSE) %>% 
              RunUMAP(reduction = "pca", 
                      dims = 1:30,
                      n.neighbors = 20, 
                      n.components = 2, 
                      metric = "cosine", 
                      n.epochs = 750,
                      seed.use = 312, 
                      verbose = FALSE) %>% 
              FindNeighbors(reduction = "pca", 
                            k.param = 20,
                            nn.method = "annoy", 
                            annoy.metric = "cosine", 
                            verbose = FALSE) %>% 
              FindClusters(algorithm = 4, 
                           method = "igraph", 
                           resolution = 0.5, 
                           random.seed = 312, 
                           verbose = FALSE)

Here we examine some of our dataset’s attributes - the original celltype annotations, our new clusters, age in post-conception weeks (PCW), and the proportion of each cell’s reads aligning to ribosomal genes.

Code
p1 <- DimPlot(seu_cortex, 
              group.by = "celltype_original",
              pt.size = 1.25,  
              alpha = 0.75) + 
      scale_color_manual(values = palette_celltype) + 
      theme_scLANE(umap = TRUE) + 
      labs(color = "Celltype") + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank()) + 
      guide_umap()
p2 <- FeaturePlot(seu_cortex, 
                  features = "Age_in_Weeks", 
                  pt.size = 1.25,  
                  alpha = 0.75) + 
      scale_color_gradientn(colors = palette_heatmap) + 
      labs(color = "Age (weeks)") + 
      theme_scLANE(umap = TRUE) + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank())
p3 <- DimPlot(seu_cortex, 
              group.by = "seurat_clusters", 
              pt.size = 1.25,  
              alpha = 0.75) + 
      scale_color_manual(values = palette_cluster) + 
      labs(color = "Leiden Cluster") + 
      theme_scLANE(umap = TRUE) + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank()) + 
      guide_umap()
p4 <- FeaturePlot(seu_cortex, 
                  features = "percent_ribo", 
                  pt.size = 1.25,  
                  alpha = 0.75) + 
      scale_color_gradientn(colors = palette_heatmap, labels = scales::label_percent(accuracy = 1, scale = 1)) + 
      labs(color = "Ribosomal\nReads") + 
      theme_scLANE(umap = TRUE) + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank())
p5 <- (p1 / p2) | (p3 / p4)
p5 <- ggpubr::annotate_figure(ggpubr::ggarrange(p5), 
                              bottom = "UMAP 1",
                              left = "UMAP 2")
p5
Figure 1: Characteristics of the cortical development dataset

6 Analysis

6.1 Celltype annotation

We begin by running a basic differential expression (DE) testing routine to identify some potential marker genes for each cluster. This will help us assign a celltype identity to each cluster.

Code
cluster_markers <- FindAllMarkers(seu_cortex, 
                                  assay = "RNA", 
                                  logfc.threshold = 0.25, 
                                  test.use = "wilcox", 
                                  slot = "data", 
                                  min.pct = 0.1, 
                                  only.pos = TRUE, 
                                  verbose = FALSE, 
                                  random.seed = 312)
top5_cluster_markers <- arrange(cluster_markers, p_val_adj) %>% 
                        with_groups(cluster, 
                                    slice_head, 
                                    n = 5)

We visualize the markers for each cluster with a dotplot, which shows clear differences between clusters. For example, the transcription factor (TF) aristaless related homeobox (ARX) is specifically expressed in cluster 7; this TF’s expression is known to be necessary for healthy neuronal development and helps to regulate differentiation.

Code
p6 <- DotPlot(seu_cortex, 
              features = unique(top5_cluster_markers$gene),
              assay = "RNA", 
              group.by = "seurat_clusters", 
              dot.scale = 5,
              cols = paletteer::paletteer_d("wesanderson::Zissou1")[c(1, 5)], 
              scale.by = "radius") + 
      coord_flip() +
      labs(y = "Leiden Cluster") +
      theme_scLANE() + 
      theme(axis.title.y = element_blank(), 
            axis.text.y = element_text(face = "italic"))
p6
Figure 2: Putative marker genes for each cluster

Now we identify markers for each celltype:

Code
Idents(seu_cortex) <- "celltype_original"
celltype_markers <- FindAllMarkers(seu_cortex, 
                                   assay = "RNA", 
                                   logfc.threshold = 0.25, 
                                   test.use = "wilcox", 
                                   slot = "data", 
                                   min.pct = 0.1, 
                                   only.pos = TRUE, 
                                   verbose = FALSE, 
                                   random.seed = 312)
top3_celltype_markers <- arrange(celltype_markers, p_val_adj) %>% 
                         with_groups(cluster, 
                                     slice_head, 
                                     n = 3)

The markers seem consistent with what we’d expect for excitatory neurons; for more information see e.g., Figure 4 and Suppl. Figures 4 & 11 in the original publication.

Code
select(top3_celltype_markers, 
       celltype = cluster, 
       gene, 
       avg_log2FC, 
       p_val_adj) %>% 
  kableExtra::kbl(digits = 3, 
                  booktabs = TRUE, 
                  col.names = c("Celltype", "Gene", "Mean log2FC", "Adj. P-value")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
Celltype Gene Mean log2FC Adj. P-value
Early Exc. Neuron 2 NRP1 1.124 0
Early Exc. Neuron 2 DPY19L1 1.096 0
Early Exc. Neuron 2 SOX11 0.624 0
Late Exc. Neuron ZFHX4 2.034 0
Late Exc. Neuron RP11.436D23.1 1.751 0
Late Exc. Neuron LINC00478 2.164 0
Exc. Neuron 2 DOK5 2.873 0
Exc. Neuron 2 SATB2 1.677 0
Exc. Neuron 2 STMN2 0.857 0
Exc. Neuron 3 ACTN2 4.731 0
Exc. Neuron 3 LPL 3.325 0
Exc. Neuron 3 UNC5C 6.760 0
PFC Exc. Neuron 3 RP11.98J23.2 3.008 0
PFC Exc. Neuron 3 RP11.589F5.4 3.111 0
PFC Exc. Neuron 3 RP11.1415C14.3 3.198 0
Exc. Neuron 1 CRYM 3.523 0
Exc. Neuron 1 GRIK3 2.968 0
Exc. Neuron 1 FEZF2 3.872 0
PFC Exc. Neuron 1 CDH18 3.441 0
PFC Exc. Neuron 1 SORCS1 4.858 0
PFC Exc. Neuron 1 COL19A1 9.146 0
Early Exc. Neuron 1 NDST4 3.965 0
Early Exc. Neuron 1 STK17B 3.639 0
Early Exc. Neuron 1 CNTNAP2 2.501 0
PFC Exc. Neuron 2 CPNE8 5.170 0
PFC Exc. Neuron 2 CBLN2 4.410 0
PFC Exc. Neuron 2 SLN 4.456 0
Table 1: The top 3 markers genes for each identified celltype

6.2 Pseudotime estimation

After estimating principal curves across our UMAP embedding with slingshot, we generate per-lineage pseudotime values for each cell. We use our clustering as a source of structure in the data, and assign start and end clusters based on age in PCW. In addition, we mean-aggregate the pseudotime values for each cell in order to get a total pseudotime which represents overall differentiation regardless of lineage.

Code
sling_res <- slingshot(Embeddings(seu_cortex, "umap"), 
                       clusterLabels = seu_cortex$seurat_clusters, 
                       start.clus = c("3"), 
                       end.clus = c("5", "6", "7"), 
                       approx_points = 1e3)
sling_curves <- slingCurves(sling_res, as.df = TRUE)
sling_mst <- slingMST(sling_res, as.df = TRUE)
sling_pt <- as.data.frame(slingPseudotime(sling_res)) %>% 
            rowwise() %>% 
            mutate(PT_Overall = mean(c_across(starts_with("Lineage")), na.rm = TRUE)) %>% 
            ungroup() %>% 
            mutate(across(c(starts_with("Lineage"), PT_Overall), 
                          \(x) (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))))
seu_cortex <- AddMetaData(seu_cortex, 
                          metadata = sling_pt$PT_Overall, 
                          col.name = "PT_Overall")

The graph structure estimate by slingshot is called a minimum spanning tree (MST), and describes the relationships between clusters in an undirected manner.

Code
p7 <- data.frame(Embeddings(seu_cortex, "umap")) %>% 
      mutate(leiden = seu_cortex$seurat_clusters) %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = leiden)) + 
      geom_point(size = 1.25, alpha = 0.75) + 
      geom_path(data = sling_mst, mapping = aes(x = umap_1, y = umap_2, group = Lineage), 
                linewidth = 1.25, 
                color = "black") + 
      geom_point(data = sling_mst, mapping = aes(x = umap_1, y = umap_2, fill = Cluster), 
                color = "black", 
                shape = 21, 
                size = 4.5, 
                stroke = 1.25, 
                show.legend = FALSE) + 
      scale_color_manual(values = palette_cluster) + 
      scale_fill_manual(values = palette_cluster) + 
      labs(x = "UMAP 1", y = "UMAP 2") + 
      theme_scLANE(umap = TRUE) +  
      theme(legend.title = element_blank()) + 
      guide_umap()
p7
Figure 3: UMAP embedding with MST from Slinghot overlaid

Since our data have real-life experimental timepoints, it makes sense to normalize the pseudotime within each lineage to \([0, 1]\).

Code
p8 <- data.frame(Embeddings(seu_cortex, "umap")) %>% 
      bind_cols(sling_pt) %>% 
      tidyr::pivot_longer(starts_with("Lineage"), 
                          names_to = "lineage", 
                          values_to = "pseudotime") %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = pseudotime)) + 
      facet_wrap(~lineage, nrow = 3) + 
      geom_point(size = 1.25, alpha = 0.75) + 
      labs(x = "UMAP 1", 
           y = "UMAP 2", 
           color = "Pseudotime") + 
      scale_color_gradientn(colors = palette_heatmap, labels = scales::label_number(accuracy = .01)) + 
      theme_scLANE(umap = TRUE)
p8
Figure 4: UMAP embedding colored by per-lineage pseudotime

The principal curves show a trifurcating lineage structure.

Code
p9 <- data.frame(Embeddings(seu_cortex, "umap")) %>% 
      mutate(leiden = seu_cortex$seurat_clusters) %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = leiden)) + 
      geom_point(size = 1.25, alpha = 0.75) + 
      geom_path(data = sling_curves,
                mapping = aes(x = umap_1, y = umap_2, group = Lineage), 
                color = "black", 
                linewidth = 1.5, 
                alpha = 0.75, 
                lineend = "round") + 
      scale_color_manual(values = palette_cluster) + 
      labs(x = "UMAP 1", y = "UMAP 2") + 
      theme_scLANE(umap = TRUE) + 
      theme(legend.title = element_blank()) + 
      guides(color = guide_legend(override.aes = list(size = 4, alpha = 1, stroke = 0.25)))
p9
Figure 5: UMAP embedding with principal curves from Slinghot overlaid in black

Lastly, we plot the distribution of mean-aggregated pseudotime per-timepoint and see that later timepoints have, in general, larger values of overall pseudotime. This indicates that our estimated pseudotime is linked to real experimental time in a meaningful (though fairly noisy) way.

Code
p10 <- select(seu_cortex@meta.data, 
              Age_in_Weeks, 
              PT_Overall) %>% 
       mutate(Age_in_Weeks = as.factor(Age_in_Weeks)) %>% 
       ggplot(aes(x = Age_in_Weeks, y = PT_Overall, color = Age_in_Weeks)) + 
       ggbeeswarm::geom_quasirandom(size = 1.25, 
                                    alpha = 0.75, 
                                    show.legend = FALSE) + 
       stat_summary(geom = "point", 
                    fun = "mean", 
                    color = "black",
                    size = 3) + 
       scale_color_manual(values = palette_timepoint) + 
       labs(x = "Age (PCW)", y = "Mean Psueodtime") + 
       theme_scLANE()
p10
Figure 6: Beeswarm plot displaying the distribution of mean-aggregated pseudotime for each timepoint

6.3 Trajectory DE testing

Using scLANE (see the GitHub repository or the preprint for details) we perform trajectory DE significance testing for each HVG across each lineage.

Code
top3k_hvg <- HVFInfo(seu_cortex) %>% 
             arrange(desc(variance.standardized)) %>% 
             slice_head(n = 3000) %>% 
             rownames(.)
cell_offset <- createCellOffset(seu_cortex)
pt_df <- select(sling_pt, -PT_Overall)
scLANE_models <- testDynamic(seu_cortex, 
                             pt = pt_df, 
                             genes = top3k_hvg,
                             size.factor.offset = cell_offset, 
                             n.cores = 6L, 
                             verbose = FALSE)
scLANE testing completed for 3000 genes across 3 lineages in 13.321 mins
Code
scLANE_de_res <- getResultsDE(scLANE_models)

We now have lineage-specific trajectory DE statistics for each gene:

Code
select(scLANE_de_res, 
       Gene, 
       Lineage, 
       Test_Stat, 
       P_Val, 
       P_Val_Adj,
       Gene_Dynamic_Lineage, 
       Gene_Dynamic_Overall) %>% 
  mutate(Gene_Dynamic_Lineage = if_else(Gene_Dynamic_Lineage == 1, "Dynamic", "Static"), 
         Gene_Dynamic_Overall = if_else(Gene_Dynamic_Overall == 1, "Dynamic", "Static")) %>% 
  slice_head(n = 10) %>% 
  kableExtra::kbl(digits = 4, 
                  booktabs = TRUE, 
                  col.names = c("Gene", "Lineage", "LRT stat.", "P-value", "Adj. p-value", "Gene status (lineage)", "Gene status (overall)")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
Gene Lineage LRT stat. P-value Adj. p-value Gene status (lineage) Gene status (overall)
LMO3 B 4030239 0 0 Dynamic Dynamic
LMO3 A 3924999 0 0 Dynamic Dynamic
MEF2C B 3650115 0 0 Dynamic Dynamic
LMO3 C 3439387 0 0 Dynamic Dynamic
LIMCH1 B 3100288 0 0 Dynamic Dynamic
LIMCH1 A 3097931 0 0 Dynamic Dynamic
LINC01102 B 2857935 0 0 Dynamic Dynamic
LINC01102 A 2699231 0 0 Dynamic Dynamic
SHISA2 B 2086787 0 0 Dynamic Dynamic
SHISA2 A 2046352 0 0 Dynamic Dynamic
Table 2: Top-10 most DE genes from scLANE

We identify a set of dynamic genes - dynamic meaning DE over any of the three lineages - to be used for downstream analysis.

Code
dyn_genes <- filter(scLANE_de_res, Gene_Dynamic_Overall == 1) %>% 
             distinct(Gene) %>% 
             pull(Gene)

Lastly, we pull a table of gene dynamics for each lineage.

Code
smoothed_counts <- smoothedCountsMatrix(scLANE_models, 
                                        size.factor.offset = cell_offset, 
                                        pt = pt_df, 
                                        genes = dyn_genes, 
                                        log1p.norm = TRUE, 
                                        n.cores = 4L)

6.4 Constructing a GRN from scratch

Building a GRN requires two main inputs - a set of transcription factors, and a set of potential target genes.

Code
hs_ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
hs_tf_raw <- readr::read_csv("http://humantfs.ccbr.utoronto.ca/download/v_1.01/DatabaseExtract_v_1.01.csv",
                             col_select = -1,
                             num_threads = 2L,
                             show_col_types = FALSE) %>%
             janitor::clean_names() %>%
             filter(is_tf == "Yes") %>%
             select(ensembl_id)
hs_tfs <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol", "entrezgene_id", "description", "gene_biotype"),
                filters = "ensembl_gene_id",
                values = hs_tf_raw$ensembl_id,
                mart = hs_ensembl,
                uniqueRows = TRUE) %>%
          rename(ensembl_id = ensembl_gene_id,
                 entrez_id = entrezgene_id) %>%
          arrange(ensembl_id) %>%
          mutate(hgnc_symbol = if_else(hgnc_symbol == "", NA_character_, hgnc_symbol),
                 description = gsub("\\[Source.*", "", description))

Along with the HGNC symbol, we used BiomaRt to retrieve the Ensembl & Entrez IDs and a description for every TF.

Code
slice_sample(hs_tfs, n = 10) %>% 
  kableExtra::kbl(booktabs = TRUE, 
                  col.names = c("Ensembl ID", "HGNC symbol", "Entrez ID", "Description", "Biotype")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
Ensembl ID HGNC symbol Entrez ID Description Biotype
ENSG00000173041 ZNF680 340252 zinc finger protein 680 protein_coding
ENSG00000149922 TBX6 6911 T-box transcription factor 6 protein_coding
ENSG00000085274 MYNN 55892 myoneurin protein_coding
ENSG00000169740 ZNF32 7580 zinc finger protein 32 protein_coding
ENSG00000214534 ZNF705EP NA zinc finger protein 705E, pseudogene transcribed_unprocessed_pseudogene
ENSG00000129071 MBD4 8930 methyl-CpG binding domain 4, DNA glycosylase protein_coding
ENSG00000114853 ZBTB47 92999 zinc finger and BTB domain containing 47 protein_coding
ENSG00000111206 FOXM1 2305 forkhead box M1 protein_coding
ENSG00000184436 THAP7 80764 THAP domain containing 7 protein_coding
ENSG00000196172 ZNF681 148213 zinc finger protein 681 protein_coding
Table 3: Human transcription factors

We split our set of 295 trajectory DE genes into the two categories using the existing set of human TFs I found online.

Code
valid_hs_tfs <- hs_tfs$hgnc_symbol[hs_tfs$hgnc_symbol %in% dyn_genes]
target_genes <- dyn_genes[!dyn_genes %in% valid_hs_tfs]

The GRN has the following structure - for every target gene, a penalized model is fit with the dynamics of every possible TF as features (see below). The model’s internal feature selection identifies a subset of TFs whose dynamics are predictive of the target gene’s dynamics. See e.g., the original XGBoost paper for details.

\[ \text{Dynamics}_{\text{Target}} = \begin{bmatrix} \text{Dynamics}_{\text{TF}_1} & \text{Dynamics}_{\text{TF}_2} & \dots & \text{Dynamics}_{\text{TF}_N} \end{bmatrix} \tag{1}\]

Our GRNs will be lineage-specific, since the matrix of gene dynamics differs by lineage. First we fit the GRN for lineage A; I’ve opted to use the LightGBM package to fit the models as it’s lightweight and in general much faster than other options such as XGBoost. In addition, I prefer this method to alternative penalized methods such as LASSO - boosted trees better capture nonlinearities in the data, though LASSO does offer advantages such as significance testing for coefficients and linear interpretability. Both methods have built-in cross-validation (CV) routines (lightgbm::lgb.cv() and glmnet::cv.glmnet(), respectively); here we use 5-fold CV to select the best set of hyperparameters before training the final model.

Note

Since we’re using the foreach and doSNOW packages to loop over the set of target genes in parallel, we need to ensure that our LightGBM model only uses a single thread when fitting the model. This is done by setting the parameters tree_learner = "serial" and num_threads = 1L in the model parameters list.

Code
cl <- parallel::makeCluster(6L)
doSNOW::registerDoSNOW(cl)
lgbm_params <- list(objective = "gamma", 
                    tree_learner = "serial", 
                    metric = "l2", 
                    boosting_type = "gbdt", 
                    num_threads = 1L, 
                    seed = 312)
grn_res_linA <- foreach(g = seq(target_genes), 
                        .combine = "list",
                        .multicombine = ifelse(length(target_genes) > 1, TRUE, FALSE),
                        .maxcombine = ifelse(length(target_genes) > 1, length(target_genes), 2),
                        .packages = c("lightgbm", "dplyr"),
                        .errorhandling = "pass",
                        .inorder = TRUE,
                        .verbose = FALSE) %dopar% {
                          feature_mat <- smoothed_counts$Lineage_A[, valid_hs_tfs]
                          resp_var <- smoothed_counts$Lineage_A[, target_genes[g]]
                          lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                          lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                      data = lgbm_data, 
                                                      nrounds = 100L, 
                                                      nfold = 5L, 
                                                      stratified = FALSE)
                          lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = lgbm_cv$best_iter)
                          imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                       dplyr::mutate(Lineage = "A", 
                                                     Target_Gene = target_genes[g], 
                                                     .before = 1)
                          imp_table
                        }
names(grn_res_linA) <- target_genes
grn_table_linA <- purrr::imap(grn_res_linA, \(x, y) {
  if (!inherits(x, "data.frame")) {
    empty_res <- data.frame(Lineage = "A", 
                            Target_Gene = y, 
                            Feature = NA_character_, 
                            Gain = NA_real_, 
                            Cover = NA_real_, 
                            Frequency = NA_real_)
    return(empty_res)
  } else {
    return(x)
  }
})

We coerce the model output into a tidy table.

Code
grn_table_linA <- purrr::reduce(grn_table_linA, rbind) %>% 
                  rename(Tx_Factor = Feature) %>% 
                  filter(!is.na(Target_Gene), 
                         !is.na(Tx_Factor)) %>% 
                  arrange(Tx_Factor, desc(Frequency))

Next we add the Spearman correlation between TF and target gene dynamics, as well as the normalized gene expression correlation. We denote the relationship between TF and target gene as repression if the correlation between their dynamics is negative, and activation if the correlation is positive.

Warning

Usually the correlations between TF and target dynamics and TF and target expression are similar, or at least have the same directionality (positive or negative). However, when expression patterns are noisy, the correlation directions might differ (see Table 4 for examples). In this case, we’re opting to trust the dynamics more, as they are a smoothed version of expression - though an argument could be made for the opposite stance as well.

Code
dyn_cormat <- cor(smoothed_counts$Lineage_A, method = "spearman")
exp_cormat <- cor(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, ])), method = "spearman")
grn_table_linA <- mutate(grn_table_linA, 
                         Dyn_Cor = NA_real_, 
                         Exp_Cor = NA_real_)
for (i in seq(nrow(grn_table_linA))) {
  grn_table_linA$Dyn_Cor[i] <- dyn_cormat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
  grn_table_linA$Exp_Cor[i] <- exp_cormat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
}
grn_table_linA <- mutate(grn_table_linA, State = if_else(Dyn_Cor < 0, "Repression", "Activation"))

The output is shown below. Here we use the frequency with which the TF is included in the final boosted tree model as the feature importance, which can be interpreted as the strength of the predictive relationship between TF & target gene. See e.g., this StackExchange post for more detail.

Code
select(grn_table_linA, 
       Lineage, 
       Tx_Factor, 
       Target_Gene, 
       Frequency, 
       Dyn_Cor, 
       Exp_Cor, 
       State) %>% 
  slice_sample(n = 10) %>% 
  kableExtra::kbl(digits = 4, 
                  booktabs = TRUE, 
                  col.names = c("Lineage", "TF", "Target", "Importance", "Dynamics cor.", "Expression cor.", "State")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
Lineage TF Target Importance Dynamics cor. Expression cor. State
A TSHZ2 KLHL14 0.0333 0.3740 0.0356 Activation
A CXXC4 PEX5L 0.0007 0.0926 0.0962 Activation
A EOMES MDM4 0.0170 0.6390 -0.0034 Activation
A CUX2 RP11.335E14.1 0.0490 0.5553 -0.0111 Activation
A FOXP2 LGI1 0.0133 -0.4933 0.0552 Repression
A TGIF2 PTPRR 0.0540 -0.0378 -0.0173 Repression
A FOXP2 RP11.1112C15.2 0.0260 -0.7971 0.0502 Repression
A CREB5 NTRK2 0.0566 -0.6718 -0.0825 Repression
A EOMES UNC5D 0.1240 0.9615 0.1371 Activation
A DLX5 AASS 0.0723 -0.3497 -0.0159 Repression
Table 4: GRN for dynamic genes across Lineage A

We repeat the fitting process for the other two lineages (expand this code block to see the details). Skip to Section 7 if you want a version of this logic that’s been wrapped into a function!

Code
grn_res_linB <- foreach(g = seq(target_genes), 
                        .combine = "list",
                        .multicombine = ifelse(length(target_genes) > 1, TRUE, FALSE),
                        .maxcombine = ifelse(length(target_genes) > 1, length(target_genes), 2),
                        .packages = c("lightgbm", "dplyr"),
                        .errorhandling = "pass",
                        .inorder = TRUE,
                        .verbose = FALSE) %dopar% {
                          feature_mat <- smoothed_counts$Lineage_B[, valid_hs_tfs]
                          resp_var <- smoothed_counts$Lineage_B[, target_genes[g]]
                          lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                          lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                      data = lgbm_data, 
                                                      nrounds = 100L, 
                                                      nfold = 5L, 
                                                      stratified = FALSE)
                          lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = lgbm_cv$best_iter)
                          imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                       dplyr::mutate(Lineage = "B", 
                                                     Target_Gene = target_genes[g], 
                                                     .before = 1)
                          imp_table
                        }
names(grn_res_linB) <- target_genes
grn_table_linB <- purrr::imap(grn_res_linB, \(x, y) {
  if (!inherits(x, "data.frame")) {
    empty_res <- data.frame(Lineage = "B", 
                            Target_Gene = y, 
                            Feature = NA_character_, 
                            Gain = NA_real_, 
                            Cover = NA_real_, 
                            Frequency = NA_real_)
    return(empty_res)
  } else {
    return(x)
  }
})
grn_table_linB <- purrr::reduce(grn_table_linB, rbind) %>% 
                  rename(Tx_Factor = Feature) %>% 
                  filter(!is.na(Target_Gene), 
                         !is.na(Tx_Factor)) %>% 
                  arrange(Tx_Factor, desc(Frequency))
dyn_cormat <- cor(smoothed_counts$Lineage_B, method = "spearman")
grn_table_linB <- mutate(grn_table_linB, 
                         Dyn_Cor = NA_real_, 
                         Exp_Cor = NA_real_)
for (i in seq(nrow(grn_table_linB))) {
  grn_table_linB$Dyn_Cor[i] <- dyn_cormat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
  grn_table_linB$Exp_Cor[i] <- exp_cormat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
}
grn_table_linB <- mutate(grn_table_linB, State = if_else(Dyn_Cor < 0, "Repression", "Activation"))
grn_res_linC <- foreach(g = seq(target_genes), 
                        .combine = "list",
                        .multicombine = ifelse(length(target_genes) > 1, TRUE, FALSE),
                        .maxcombine = ifelse(length(target_genes) > 1, length(target_genes), 2),
                        .packages = c("lightgbm", "dplyr"),
                        .errorhandling = "pass",
                        .inorder = TRUE,
                        .verbose = FALSE) %dopar% {
                          feature_mat <- smoothed_counts$Lineage_C[, valid_hs_tfs]
                          resp_var <- smoothed_counts$Lineage_C[, target_genes[g]]
                          lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                          lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                      data = lgbm_data, 
                                                      nrounds = 100L, 
                                                      nfold = 5L, 
                                                      stratified = FALSE)
                          lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = lgbm_cv$best_iter)
                          imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                       dplyr::mutate(Lineage = "C", 
                                                     Target_Gene = target_genes[g], 
                                                     .before = 1)
                          imp_table
                        }
names(grn_res_linC) <- target_genes
grn_table_linC <- purrr::imap(grn_res_linC, \(x, y) {
  if (!inherits(x, "data.frame")) {
    empty_res <- data.frame(Lineage = "C", 
                            Target_Gene = y, 
                            Feature = NA_character_, 
                            Gain = NA_real_, 
                            Cover = NA_real_, 
                            Frequency = NA_real_)
    return(empty_res)
  } else {
    return(x)
  }
})
grn_table_linC <- purrr::reduce(grn_table_linC, rbind) %>% 
                  rename(Tx_Factor = Feature) %>% 
                  filter(!is.na(Target_Gene), 
                         !is.na(Tx_Factor)) %>% 
                  arrange(Tx_Factor, desc(Frequency))
dyn_cormat <- cor(smoothed_counts$Lineage_C, method = "spearman")
grn_table_linC <- mutate(grn_table_linC, 
                         Dyn_Cor = NA_real_, 
                         Exp_Cor = NA_real_)
for (i in seq(nrow(grn_table_linC))) {
  grn_table_linC$Dyn_Cor[i] <- dyn_cormat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
  grn_table_linC$Exp_Cor[i] <- exp_cormat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
}
grn_table_linC <- mutate(grn_table_linC, State = if_else(Dyn_Cor < 0, "Repression", "Activation"))
parallel::stopCluster(cl)

We collate all three GRNs into a single overall table:

Code
grn_table_all <- bind_rows(grn_table_linA, 
                           grn_table_linB, 
                           grn_table_linC) %>% 
                 arrange(Tx_Factor, Target_Gene)

For each lineage we create a directed, weighted graph of the relationships between TFs and target genes using the igraph package.

Code
graph_df_linA <- select(grn_table_linA, 
                        Tx_Factor, 
                        Target_Gene,
                        Frequency, 
                        State) %>% 
                 arrange(Tx_Factor, desc(Frequency)) %>% 
                 with_groups(Tx_Factor,
                             slice_head, 
                             n = 10)
graph_obj_linA <- graph.data.frame(graph_df_linA, directed = TRUE)
graph_df_linB <- select(grn_table_linB, 
                        Tx_Factor, 
                        Target_Gene,
                        Frequency, 
                        State) %>% 
                 arrange(Tx_Factor, desc(Frequency)) %>% 
                 with_groups(Tx_Factor,
                             slice_head, 
                             n = 10)
graph_obj_linB <- graph.data.frame(graph_df_linB, directed = TRUE)
graph_df_linC <- select(grn_table_linC, 
                        Tx_Factor, 
                        Target_Gene,
                        Frequency, 
                        State) %>% 
                 arrange(Tx_Factor, desc(Frequency)) %>% 
                 with_groups(Tx_Factor,
                             slice_head, 
                             n = 10)
graph_obj_linC <- graph.data.frame(graph_df_linC, directed = TRUE)

One TF of particular interest is SRY-box transcription factor 4 (SOX4). This TF helps to regulate development and is involved in the specification of cell fate, so a comparison of its behavior across lineages should be pretty neat. We use a Fruchterman-Reingold layout to embed the graphs in 2D space via the layout_with_fr() function from igraph. We see that not only do the top target genes differ widely by lineage, so too do the directions of the relationships (activation vs. repression), and the strengths of those relationships. For example, in lineage A the top target gene is BACE2, whereas for lineage B it’s LINC01268.

Code
set.seed(1)
freqs <- purrr::map(list(graph_df_linA, graph_df_linB, graph_df_linC), \(x) {
  filter(x, Tx_Factor == "SOX4") %>% 
  pull(Frequency)
})
freq_min <- min(purrr::reduce(freqs, c))
freq_max <- max(purrr::reduce(freqs, c))
p11 <- fortify(graph_obj_linA, 
               layout = layout_with_fr(graph = graph_obj_linA, 
                                       grid = "nogrid", 
                                       weights = E(graph_obj_linA)$weights^3),
               arrow.gap = .02) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       filter(name %in% (filter(graph_df_linA, Tx_Factor == "SOX4") %>% pull(Target_Gene)) | name == "SOX4") %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage A") + 
       geom_edges(aes(linewidth = Frequency, alpha = Frequency, color = State), 
                  arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                      size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 1.5), limits = c(freq_min, freq_max)) + 
       scale_alpha_continuous(range = c(0.75, 1), limits = c(freq_min, freq_max)) + 
       labs(x = "FR 1", 
            y = "FR 2", 
            color = "TF State", 
            fill = "Gene Type") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
              color = guide_legend(order = 2))
p12 <- fortify(graph_obj_linB, 
               layout = layout_with_fr(graph = graph_obj_linB, 
                                       grid = "nogrid", 
                                       weights = E(graph_obj_linB)$weights^3),
               arrow.gap = .02) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       filter(name %in% (filter(graph_df_linB, Tx_Factor == "SOX4") %>% pull(Target_Gene)) | name == "SOX4") %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage B") + 
       geom_edges(aes(linewidth = Frequency, alpha = Frequency, color = State), 
                  arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                      size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 1.5), limits = c(freq_min, freq_max)) + 
       scale_alpha_continuous(range = c(0.75, 1), limits = c(freq_min, freq_max)) + 
       labs(x = "FR 1", 
            y = "FR 2", 
            color = "TF State", 
            fill = "Gene Type") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
              color = guide_legend(order = 2))
p13 <- fortify(graph_obj_linC, 
               layout = layout_with_fr(graph = graph_obj_linC, 
                                       grid = "nogrid", 
                                       weights = E(graph_obj_linC)$weights^3),
               arrow.gap = .02) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       filter(name %in% (filter(graph_df_linC, Tx_Factor == "SOX4") %>% pull(Target_Gene)) | name == "SOX4") %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage C") + 
       geom_edges(aes(linewidth = Frequency, alpha = Frequency, color = State), 
                  arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                      size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 1.5), limits = c(freq_min, freq_max)) + 
       scale_alpha_continuous(range = c(0.75, 1), limits = c(freq_min, freq_max)) + 
       labs(x = "FR 1", 
            y = "FR 2", 
            color = "TF State", 
            fill = "Gene Type") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
              color = guide_legend(order = 2))
p14 <- (p11 / p12 / p13) + plot_layout(guides = "collect")
p14
Figure 7: Fruchterman-Reingold embeddings of the relationships between SRY-box transcription factor 4 and lineage-specific target genes

7 Conclusions

Hopefully this analysis has demonstrated the utility of trajectory GRNs for comparing gene dynamics across lineages. Beyond just visual comparison of how gene expression changes over pseudotime, trajectory GRNs allow us to identify which transcription factors are activating or suppressing target genes, how top TF targets differ between lineages, and the ways in which TF-target relationships change based on the cell fate towards which cells are progressing. To conclude, I’ve written up the methodology for creating a trajectory GRN into a function, which is shown in the code block below. It includes functionality for a progress bar if desired, and takes as input a matrix of gene dynamics (expr.mat), a set of dynamic genes (dyn.genes), and a set of transcription factors (tx.factors). Eventually I plan to expand this functionality and turn it into an R package, but for now it’ll stay a simple function.

Code
estimateTGRN <- function(expr.mat = NULL,
                         dyn.genes = NULL, 
                         tx.factors = NULL, 
                         log1p.norm = TRUE,
                         cor.method = "spearman",
                         n.cores = 4L, 
                         verbose = TRUE, 
                         random.seed = 312) {
  # check inputs 
  if (is.null(expr.mat) || is.null(dyn.genes) || is.null(tx.factors)) { stop("Arguments to estimateTGRN() are missing.") }
  # ID non-TF genes & filter out non-dynamic TFs
  target_genes <- dyn.genes[!dyn.genes %in% tx.factors]
  tx.factors <- tx.factors[tx.factors %in% dyn.genes]
  # normalize matrix of dynamics
  if (log1p.norm) {
    expr.mat <- log1p(expr.mat)
  }
  # set up progress bar
  if (verbose) {
    withr::with_output_sink(tempfile(), {
      pb <- utils::txtProgressBar(0, length(target_genes), style = 3)
    })
    progress_fun <- function(n) utils::setTxtProgressBar(pb, n)
    snow_opts <- list(progress = progress_fun)
  } else {
    snow_opts <- list()
  }
  # set up parallel processing
  if (n.cores > 1L) {
    cl <- parallel::makeCluster(6L)
    doSNOW::registerDoSNOW(cl)
  } else {
    cl <- foreach::registerDoSEQ()
  }
  # set up LightGBM model settings
  lgbm_params <- list(objective = "gamma", 
                      tree_learner = "serial", 
                      metric = "l2", 
                      boosting_type = "gbdt", 
                      num_threads = 1L, 
                      seed = random.seed)
  # estimate trajectory GRN
  grn_res <- foreach::foreach(g = seq(target_genes), 
                              .combine = "list",
                              .multicombine = ifelse(length(target_genes) > 1, TRUE, FALSE),
                              .maxcombine = ifelse(length(target_genes) > 1, length(target_genes), 2),
                              .packages = c("lightgbm", "dplyr"),
                              .export = c("expr.mat", "tx.factors", "target_genes", "lgbm_params"), 
                              .errorhandling = "pass",
                              .inorder = TRUE,
                              .verbose = FALSE,
                              .options.snow = snow_opts) %dopar% {
                                feature_mat <- expr.mat[, tx.factors]
                                resp_var <- expr.mat[, target_genes[g]]
                                lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                                lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = 100L, 
                                                            nfold = 5L, 
                                                            stratified = FALSE)
                                lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                                  data = lgbm_data, 
                                                                  nrounds = lgbm_cv$best_iter)
                                imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                             dplyr::mutate(Target_Gene = target_genes[g], .before = 1)
                                imp_table
                              }
  names(grn_res) <- target_genes
  if (n.cores > 1L) {
    parallel::stopCluster(cl)
  }
  # format GRN table
  grn_table <- purrr::imap(grn_res, \(x, y) {
    if (!inherits(x, "data.frame")) {
      empty_res <- data.frame(Target_Gene = y, 
                              Feature = NA_character_, 
                              Gain = NA_real_, 
                              Cover = NA_real_, 
                              Frequency = NA_real_)
      return(empty_res)
    } else {
      return(x)
    }
  })
  grn_table <- purrr::reduce(grn_table, rbind) %>% 
               dplyr::rename(Tx_Factor = Feature) %>% 
               dplyr::filter(!is.na(Target_Gene), 
                             !is.na(Tx_Factor))
  # add correlation b/t TF and each target gene 
  dyn_cormat <- stats::cor(expr.mat[, dyn.genes], method = cor.method)
  grn_table <- dplyr::mutate(grn_table, Dyn_Cor = NA_real_)
  for (i in seq(nrow(grn_table))) {
    grn_table$Dyn_Cor[i] <- dyn_cormat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
  }
  grn_table <- dplyr::mutate(grn_table, State = dplyr::if_else(Dyn_Cor < 0, "Repression", "Activation"))
  return(grn_table)
}

8 Session info

Code
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Sonoma 14.1.2
 system   x86_64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-01-21
 pandoc   3.1.9 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package                * version    date (UTC) lib source
 abind                    1.4-5      2016-07-21 [1] CRAN (R 4.3.0)
 AnnotationDbi            1.64.1     2023-11-03 [1] Bioconductor
 AnnotationFilter         1.26.0     2023-10-24 [1] Bioconductor
 AnnotationHub            3.10.0     2023-10-24 [1] Bioconductor
 backports                1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
 beeswarm                 0.4.0      2021-06-01 [1] CRAN (R 4.3.0)
 bigassertr               0.1.6      2023-01-10 [1] CRAN (R 4.3.0)
 bigparallelr             0.3.2      2021-10-02 [1] CRAN (R 4.3.0)
 bigstatsr                1.5.12     2022-10-14 [1] CRAN (R 4.3.0)
 Biobase                * 2.62.0     2023-10-24 [1] Bioconductor
 BiocFileCache            2.10.1     2023-10-26 [1] Bioconductor
 BiocGenerics           * 0.48.1     2023-11-01 [1] Bioconductor
 BiocIO                   1.12.0     2023-10-24 [1] Bioconductor
 BiocManager              1.30.22    2023-08-08 [1] CRAN (R 4.3.0)
 BiocParallel             1.36.0     2023-10-24 [1] Bioconductor
 BiocVersion              3.18.1     2023-11-15 [1] Bioconductor
 biomaRt                * 2.58.0     2023-10-24 [1] Bioconductor
 Biostrings               2.70.1     2023-10-25 [1] Bioconductor
 bit                      4.0.5      2022-11-15 [1] CRAN (R 4.3.0)
 bit64                    4.0.5      2020-08-30 [1] CRAN (R 4.3.0)
 bitops                   1.0-7      2021-04-24 [1] CRAN (R 4.3.0)
 blob                     1.2.4      2023-03-17 [1] CRAN (R 4.3.0)
 boot                     1.3-28.1   2022-11-22 [1] CRAN (R 4.3.2)
 broom                    1.0.5      2023-06-09 [1] CRAN (R 4.3.0)
 broom.mixed              0.2.9.4    2022-04-17 [1] CRAN (R 4.3.0)
 cachem                   1.0.8      2023-05-01 [1] CRAN (R 4.3.0)
 car                      3.1-2      2023-03-30 [1] CRAN (R 4.3.0)
 carData                  3.0-5      2022-01-06 [1] CRAN (R 4.3.0)
 cli                      3.6.2      2023-12-11 [1] CRAN (R 4.3.0)
 cluster                  2.1.6      2023-12-01 [1] CRAN (R 4.3.0)
 codetools                0.2-19     2023-02-01 [1] CRAN (R 4.3.2)
 colorspace               2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 cowplot                  1.1.2      2023-12-15 [1] CRAN (R 4.3.0)
 crayon                   1.5.2      2022-09-29 [1] CRAN (R 4.3.0)
 curl                     5.2.0      2023-12-08 [1] CRAN (R 4.3.0)
 data.table               1.14.10    2023-12-08 [1] CRAN (R 4.3.0)
 DBI                      1.2.0      2023-12-21 [1] CRAN (R 4.3.0)
 dbplyr                   2.4.0      2023-10-26 [1] CRAN (R 4.3.0)
 DelayedArray             0.28.0     2023-10-24 [1] Bioconductor
 DelayedMatrixStats       1.24.0     2023-10-24 [1] Bioconductor
 deldir                   2.0-2      2023-11-23 [1] CRAN (R 4.3.0)
 digest                   0.6.33     2023-07-07 [1] CRAN (R 4.3.0)
 doParallel               1.0.17     2022-02-07 [1] CRAN (R 4.3.0)
 doSNOW                   1.0.20     2022-02-04 [1] CRAN (R 4.3.0)
 dotCall64                1.1-1      2023-11-28 [1] CRAN (R 4.3.0)
 dplyr                  * 1.1.4      2023-11-17 [1] CRAN (R 4.3.0)
 ellipsis                 0.3.2      2021-04-29 [1] CRAN (R 4.3.0)
 ensembldb                2.26.0     2023-10-24 [1] Bioconductor
 evaluate                 0.23       2023-11-01 [1] CRAN (R 4.3.0)
 ExperimentHub            2.10.0     2023-10-24 [1] Bioconductor
 fansi                    1.0.6      2023-12-08 [1] CRAN (R 4.3.0)
 farver                   2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
 fastDummies              1.7.3      2023-07-06 [1] CRAN (R 4.3.0)
 fastmap                  1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 ff                       4.0.9      2023-01-25 [1] CRAN (R 4.3.0)
 filelock                 1.0.3      2023-12-11 [1] CRAN (R 4.3.0)
 fitdistrplus             1.1-11     2023-04-25 [1] CRAN (R 4.3.0)
 flock                    0.7        2016-11-12 [1] CRAN (R 4.3.0)
 forcats                  1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 foreach                * 1.5.2      2022-02-02 [1] CRAN (R 4.3.0)
 furrr                    0.3.1      2022-08-15 [1] CRAN (R 4.3.0)
 future                   1.33.1     2023-12-22 [1] CRAN (R 4.3.0)
 future.apply             1.11.1     2023-12-21 [1] CRAN (R 4.3.0)
 gamlss                   5.4-20     2023-10-04 [1] CRAN (R 4.3.0)
 gamlss.data              6.0-2      2021-11-07 [1] CRAN (R 4.3.0)
 gamlss.dist              6.1-1      2023-08-23 [1] CRAN (R 4.3.0)
 geeM                     0.10.1     2018-06-18 [1] CRAN (R 4.3.0)
 generics                 0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 GenomeInfoDb           * 1.38.5     2023-12-28 [1] Bioconductor 3.18 (R 4.3.2)
 GenomeInfoDbData         1.2.11     2023-12-22 [1] Bioconductor
 GenomicAlignments        1.38.0     2023-10-24 [1] Bioconductor
 GenomicFeatures          1.54.1     2023-10-29 [1] Bioconductor
 GenomicRanges          * 1.54.1     2023-10-29 [1] Bioconductor
 ggbeeswarm               0.7.2      2023-04-29 [1] CRAN (R 4.3.0)
 ggnetwork              * 0.5.12     2023-03-06 [1] CRAN (R 4.3.0)
 ggplot2                * 3.4.4      2023-10-12 [1] CRAN (R 4.3.0)
 ggpubr                   0.6.0      2023-02-10 [1] CRAN (R 4.3.0)
 ggrepel                  0.9.4      2023-10-13 [1] CRAN (R 4.3.0)
 ggridges                 0.5.5      2023-12-15 [1] CRAN (R 4.3.0)
 ggsignif                 0.6.4      2022-10-13 [1] CRAN (R 4.3.0)
 glm2                   * 1.2.1      2018-08-11 [1] CRAN (R 4.3.0)
 glmmTMB                  1.1.8      2023-10-07 [1] CRAN (R 4.3.0)
 globals                  0.16.2     2022-11-21 [1] CRAN (R 4.3.0)
 glue                     1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
 goftest                  1.2-3      2021-10-07 [1] CRAN (R 4.3.0)
 gridExtra                2.3        2017-09-09 [1] CRAN (R 4.3.0)
 gtable                   0.3.4      2023-08-21 [1] CRAN (R 4.3.0)
 highr                    0.10       2022-12-22 [1] CRAN (R 4.3.0)
 hms                      1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
 htmltools                0.5.7      2023-11-03 [1] CRAN (R 4.3.0)
 htmlwidgets              1.6.4      2023-12-06 [1] CRAN (R 4.3.0)
 httpuv                   1.6.13     2023-12-06 [1] CRAN (R 4.3.0)
 httr                     1.4.7      2023-08-15 [1] CRAN (R 4.3.0)
 ica                      1.0-3      2022-07-08 [1] CRAN (R 4.3.0)
 igraph                 * 1.6.0      2023-12-11 [1] CRAN (R 4.3.0)
 interactiveDisplayBase   1.40.0     2023-10-24 [1] Bioconductor
 IRanges                * 2.36.0     2023-10-24 [1] Bioconductor
 irlba                    2.3.5.1    2022-10-03 [1] CRAN (R 4.3.0)
 iterators                1.0.14     2022-02-05 [1] CRAN (R 4.3.0)
 janitor                  2.2.0      2023-02-02 [1] CRAN (R 4.3.0)
 jsonlite                 1.8.8      2023-12-04 [1] CRAN (R 4.3.0)
 kableExtra               1.3.4      2021-02-20 [1] CRAN (R 4.3.0)
 KEGGREST                 1.42.0     2023-10-24 [1] Bioconductor
 KernSmooth               2.23-22    2023-07-10 [1] CRAN (R 4.3.2)
 knitr                    1.45       2023-10-30 [1] CRAN (R 4.3.0)
 labeling                 0.4.3      2023-08-29 [1] CRAN (R 4.3.0)
 later                    1.3.2      2023-12-06 [1] CRAN (R 4.3.0)
 lattice                  0.22-5     2023-10-24 [1] CRAN (R 4.3.0)
 lazyeval                 0.2.2      2019-03-15 [1] CRAN (R 4.3.0)
 leiden                   0.4.3.1    2023-11-17 [1] CRAN (R 4.3.0)
 lifecycle                1.0.4      2023-11-07 [1] CRAN (R 4.3.0)
 limma                    3.58.1     2023-10-31 [1] Bioconductor
 listenv                  0.9.0      2022-12-16 [1] CRAN (R 4.3.0)
 lme4                     1.1-35.1   2023-11-05 [1] CRAN (R 4.3.0)
 lmtest                   0.9-40     2022-03-21 [1] CRAN (R 4.3.0)
 lubridate                1.9.3      2023-09-27 [1] CRAN (R 4.3.0)
 magrittr               * 2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 MASS                     7.3-60     2023-05-04 [1] CRAN (R 4.3.0)
 Matrix                   1.6-4      2023-11-30 [1] CRAN (R 4.3.0)
 MatrixGenerics         * 1.14.0     2023-10-24 [1] Bioconductor
 matrixStats            * 1.2.0      2023-12-11 [1] CRAN (R 4.3.0)
 memoise                  2.0.1      2021-11-26 [1] CRAN (R 4.3.0)
 mgcv                     1.9-1      2023-12-21 [1] CRAN (R 4.3.0)
 mime                     0.12       2021-09-28 [1] CRAN (R 4.3.0)
 miniUI                   0.1.1.1    2018-05-18 [1] CRAN (R 4.3.0)
 minqa                    1.2.6      2023-09-11 [1] CRAN (R 4.3.0)
 munsell                  0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 nlme                     3.1-164    2023-11-27 [1] CRAN (R 4.3.0)
 nloptr                   2.0.3      2022-05-26 [1] CRAN (R 4.3.0)
 numDeriv                 2016.8-1.1 2019-06-06 [1] CRAN (R 4.3.0)
 paletteer                1.5.0      2022-10-19 [1] CRAN (R 4.3.0)
 parallelly               1.36.0     2023-05-26 [1] CRAN (R 4.3.0)
 patchwork              * 1.1.3      2023-08-14 [1] CRAN (R 4.3.0)
 pbapply                  1.7-2      2023-06-27 [1] CRAN (R 4.3.0)
 pillar                   1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig                2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 plotly                   4.10.3     2023-10-21 [1] CRAN (R 4.3.0)
 plyr                     1.8.9      2023-10-02 [1] CRAN (R 4.3.0)
 png                      0.1-8      2022-11-29 [1] CRAN (R 4.3.0)
 polyclip                 1.10-6     2023-09-27 [1] CRAN (R 4.3.0)
 presto                   1.0.0      2023-12-27 [1] Github (immunogenomics/presto@31dc97f)
 prettyunits              1.2.0      2023-09-24 [1] CRAN (R 4.3.0)
 princurve              * 2.1.6      2021-01-18 [1] CRAN (R 4.3.0)
 prismatic                1.1.1      2022-08-15 [1] CRAN (R 4.3.0)
 progress                 1.2.3      2023-12-06 [1] CRAN (R 4.3.0)
 progressr                0.14.0     2023-08-10 [1] CRAN (R 4.3.0)
 promises                 1.2.1      2023-08-10 [1] CRAN (R 4.3.0)
 ProtGenerics             1.34.0     2023-10-24 [1] Bioconductor
 ps                       1.7.5      2023-04-18 [1] CRAN (R 4.3.0)
 purrr                    1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
 R6                       2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 RANN                     2.6.1      2019-01-08 [1] CRAN (R 4.3.0)
 rappdirs                 0.3.3      2021-01-31 [1] CRAN (R 4.3.0)
 RColorBrewer             1.1-3      2022-04-03 [1] CRAN (R 4.3.0)
 Rcpp                     1.0.11     2023-07-06 [1] CRAN (R 4.3.0)
 RcppAnnoy                0.0.21     2023-07-02 [1] CRAN (R 4.3.0)
 RcppEigen                0.3.3.9.4  2023-11-02 [1] CRAN (R 4.3.0)
 RcppHNSW                 0.5.0      2023-09-19 [1] CRAN (R 4.3.0)
 RCurl                    1.98-1.13  2023-11-02 [1] CRAN (R 4.3.0)
 readr                    2.1.4      2023-02-10 [1] CRAN (R 4.3.0)
 rematch2                 2.1.2      2020-05-01 [1] CRAN (R 4.3.0)
 reshape2                 1.4.4      2020-04-09 [1] CRAN (R 4.3.0)
 restfulr                 0.0.15     2022-06-16 [1] CRAN (R 4.3.0)
 reticulate               1.34.0     2023-10-12 [1] CRAN (R 4.3.0)
 rjson                    0.2.21     2022-01-09 [1] CRAN (R 4.3.0)
 rlang                    1.1.2      2023-11-04 [1] CRAN (R 4.3.0)
 rmarkdown                2.25       2023-09-18 [1] CRAN (R 4.3.0)
 rmio                     0.4.0      2022-02-17 [1] CRAN (R 4.3.0)
 ROCR                     1.0-11     2020-05-02 [1] CRAN (R 4.3.0)
 Rsamtools                2.18.0     2023-10-24 [1] Bioconductor
 RSpectra                 0.16-1     2022-04-24 [1] CRAN (R 4.3.0)
 RSQLite                  2.3.4      2023-12-08 [1] CRAN (R 4.3.0)
 rstatix                  0.7.2      2023-02-01 [1] CRAN (R 4.3.0)
 rstudioapi               0.15.0     2023-07-07 [1] CRAN (R 4.3.0)
 rtracklayer              1.62.0     2023-10-24 [1] Bioconductor
 Rtsne                    0.17       2023-12-07 [1] CRAN (R 4.3.0)
 rvest                    1.0.3      2022-08-19 [1] CRAN (R 4.3.0)
 S4Arrays                 1.2.0      2023-10-24 [1] Bioconductor
 S4Vectors              * 0.40.2     2023-11-23 [1] Bioconductor
 scales                   1.3.0      2023-11-28 [1] CRAN (R 4.3.0)
 scattermore              1.2        2023-06-12 [1] CRAN (R 4.3.0)
 scLANE                 * 0.7.9      2024-01-07 [1] Bioconductor
 scRNAseq               * 2.16.0     2023-10-26 [1] Bioconductor
 sctransform              0.4.1      2023-10-19 [1] CRAN (R 4.3.0)
 sessioninfo              1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 Seurat                 * 5.0.1      2023-11-17 [1] CRAN (R 4.3.0)
 SeuratObject           * 5.0.1      2023-11-17 [1] CRAN (R 4.3.0)
 shiny                    1.8.0      2023-11-17 [1] CRAN (R 4.3.0)
 SingleCellExperiment   * 1.24.0     2023-10-24 [1] Bioconductor
 slingshot              * 2.10.0     2023-10-24 [1] Bioconductor
 snakecase                0.11.1     2023-08-27 [1] CRAN (R 4.3.0)
 snow                     0.4-4      2021-10-27 [1] CRAN (R 4.3.0)
 sp                     * 2.1-2      2023-11-26 [1] CRAN (R 4.3.0)
 spam                     2.10-0     2023-10-23 [1] CRAN (R 4.3.0)
 SparseArray              1.2.3      2023-12-25 [1] Bioconductor 3.18 (R 4.3.2)
 sparseMatrixStats        1.14.0     2023-10-24 [1] Bioconductor
 spatstat.data            3.0-3      2023-10-24 [1] CRAN (R 4.3.0)
 spatstat.explore         3.2-5      2023-10-22 [1] CRAN (R 4.3.0)
 spatstat.geom            3.2-7      2023-10-20 [1] CRAN (R 4.3.0)
 spatstat.random          3.2-2      2023-11-29 [1] CRAN (R 4.3.0)
 spatstat.sparse          3.0-3      2023-10-24 [1] CRAN (R 4.3.0)
 spatstat.utils           3.0-4      2023-10-24 [1] CRAN (R 4.3.0)
 statmod                  1.5.0      2023-01-06 [1] CRAN (R 4.3.0)
 stringi                  1.8.3      2023-12-11 [1] CRAN (R 4.3.0)
 stringr                  1.5.1      2023-11-14 [1] CRAN (R 4.3.0)
 SummarizedExperiment   * 1.32.0     2023-10-24 [1] Bioconductor
 survival                 3.5-7      2023-08-14 [1] CRAN (R 4.3.2)
 svglite                  2.1.3      2023-12-08 [1] CRAN (R 4.3.0)
 systemfonts              1.0.5      2023-10-09 [1] CRAN (R 4.3.0)
 tensor                   1.5        2012-05-05 [1] CRAN (R 4.3.0)
 tibble                   3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyr                    1.3.0      2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect               1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 timechange               0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
 TMB                      1.9.10     2023-12-12 [1] CRAN (R 4.3.0)
 TrajectoryUtils        * 1.10.0     2023-10-24 [1] Bioconductor
 tzdb                     0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
 utf8                     1.2.4      2023-10-22 [1] CRAN (R 4.3.0)
 uwot                     0.1.16     2023-06-29 [1] CRAN (R 4.3.0)
 vctrs                    0.6.5      2023-12-01 [1] CRAN (R 4.3.0)
 vipor                    0.4.7      2023-12-18 [1] CRAN (R 4.3.0)
 viridisLite              0.4.2      2023-05-02 [1] CRAN (R 4.3.0)
 vroom                    1.6.5      2023-12-05 [1] CRAN (R 4.3.0)
 webshot                  0.5.5      2023-06-26 [1] CRAN (R 4.3.0)
 withr                    2.5.2      2023-10-30 [1] CRAN (R 4.3.0)
 xfun                     0.41       2023-11-01 [1] CRAN (R 4.3.0)
 XML                      3.99-0.16  2023-11-29 [1] CRAN (R 4.3.0)
 xml2                     1.3.6      2023-12-04 [1] CRAN (R 4.3.0)
 xtable                   1.8-4      2019-04-21 [1] CRAN (R 4.3.0)
 XVector                  0.42.0     2023-10-24 [1] Bioconductor
 yaml                     2.3.8      2023-12-11 [1] CRAN (R 4.3.0)
 zlibbioc                 1.48.0     2023-10-24 [1] Bioconductor
 zoo                      1.8-12     2023-04-13 [1] CRAN (R 4.3.0)

 [1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library

─ Python configuration ───────────────────────────────────────────────────────
 python:         /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/bin/python
 libpython:      /usr/local/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/config-3.11-darwin/libpython3.11.dylib
 pythonhome:     /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site:/Users/jack/Desktop/PhD/Research/Python_Envs/personal_site
 version:        3.11.6 (main, Nov  2 2023, 04:52:24) [Clang 14.0.3 (clang-1403.0.22.14.1)]
 numpy:          /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/lib/python3.11/site-packages/numpy
 numpy_version:  1.23.5
 leidenalg:      /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/lib/python3.11/site-packages/leidenalg
 
 NOTE: Python version was forced by use_python() function

──────────────────────────────────────────────────────────────────────────────